Bubble Statistics and Dynamics in Double-Stranded DNA 
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The dynamical properties of double-stranded DNA are studied in the framework of the Peyrard- 
Bishop-Dauxois model using Langevin dynamics. Our simulations are analyzed in terms of two 
probability functions describing coherently localized separations ('bubbles') of the double strand. 
We find that the resulting bubble distributions are more sharply peaked at the active sites than 
found in thermodynamically obtained distributions. Our analysis ascribes this to the fact that 
the bubble life-times significantly affects the distribution function. We find that certain base-pair 
sequences promote long-lived bubbles and we argue that this is due to a length scale competition 
between the nonlinearity and disorder present in the system. 
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The role of dynamics in biological function is becom- 
ing increasingly clear 0, 0, ■ Whereas protein action 
and binding have traditionally been discussed in terms of 
static structures, it is now evident that many functionali- 
ties are consequences of dynamics. Because of its biolog- 
ical importance and structural clarity DNA constitutes 
an appropriate system in which to begin to understand 
how structure and thermal motion can work together to 
determine function. In particular, the identification of 
biological processes that are regulated by the dynami- 
cal properties of DNA is fundamentally important for 
understanding its interaction with other molecules. Key 
mechanisms are controlled by entropically driven thermal 
fluctuations, which cause local dynamical changes in the 
inter-strand separation ('bubbles') in double-stranded 
DNA molecules. Recent theoretical and experimental 
studies Q suggest that the base pair sequence (struc- 
ture) determines specific regions in the double-strand 
that are more prone to such thermally induced strand 
separation. Most importantly, these studies have demon- 
strated a strong correlation between the specific location 
of large, coherent openings, in DNA and transcription- 
promoting regions of the DNA sequence for several well- 
characterized viral sequences. It has also be found Q 
that the DNA dynamics in the presence of UV induced 
dimers between two adjacent thymine bases (TT-dimers) 
is dramatically altered in the neighborhood of the dimer, 
suggesting an enhancing role for the large fluctuations 
present at the dimer site in the dimer recognition path- 
way. In both cases the theoretical characterization has 
been provided by the Peyrard-Bishop-Dauxois (PBD) 
model IE0- However, recently it was argued H,|a| that 
thermodynamic characterization of the thermal fluctua- 
tions may differ from a dynamical characterization, which 
points to the need for a thorough understanding of the 
dynamical effects in this highly nonlinear and cooperative 
material. 

Here, we use finite temperature Langevin simulations 
to probe the impact of sequence heterogeneity on bub- 
ble dynamics in six different sequences all composed of 
69 base pairs: i) two homogeneous sequences composed 
purely of thymine-adenine (T-A) and guanine-cytosine 



(G-C) base-pairs, respectively, and ii) two specific het- 
erogeneous sequence: Adeno Associate Viral (AAV) P5 
promoter and a mutated (AAV) P5 promoter obtained 
from the wild AAV promoter by replacing two specific 
neighboring A-T base-pairs with G-C base-pairs (see Ref. 
for details). Finally, iii) we investigate two periodic 
sequences each containing 35 (T-A) base-pairs and 34 
(G-C) base-pairs that have different periodicities - G\A\ 
- with a period of 2 base pairs and G5A5 with a period 
of 10 base pairs. We compare our results to thermody- 
namic results for the inter-strand DNA opening obtained 
with the same model and observe several important dif- 
ferences. We emphasize that the Langevin dynamics of 
the PBD model's principal degrees-of-freedom for DNA 
base pairs is necessarily a phenomenological representa- 
tion of DNA's full complexity: microscopic fine scales of, 
e.g., water motions are not explicitly modeled. 

In the framework of the PBD model, the thermal dy- 
namics of the n'th base-pair is obtained through: 

my n = - V'{y n ) - W'(y n+ i,y n ) - W(y n ,y n -i) 

- m7y n + £„(t), (1) 

where 

V(y) = D n (e~ a -y - l) 2 , 

is an on-site Morse potential modeling the hydrogen 
bonding of complementary bases, and representing the 
exact sequence |lf| . and 

W(x, y) = |(1 + pe-^ x+ y^){x - y) 2 

represents the nonlinear stacking interactions. Here 
prime denotes differentiation with respect to y n , 7 is the 
friction constant, and the random force £ n (i) is Gaus- 
sian distributed white noise. With the parameter values 
used (see Refs. [H03, 

the success of the model in de- 
scribing the base-pair openings of double-stranded DNA 
has been demonstrated by direct comparison with various 
experiments on the melting transition Il2l. SI nuclease 
digestion (jL pre-denaturation bubbles [13j, and forced 
unzipping jl4| . 

Here we simulate the dynamics of double-stranded 
DNA at T = 300K by numerically integrating the system 
of stochastic equations JQl, applying periodic boundary 
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conditions. In the presence of the thermal bath, mod- 
eled by the random forces, the creation of a bubble is 
a stochastic process 01 most appropriately described in 
terms of a probability. We define the probability for bub- 
ble existence as: 



P n (l,tr) 



~(l.tr) \ 

- A %n(Ur)] 



(2) 



M 



where 



IM 



denotes averaging over M(~ 1000) simula- 
tions. The integer index, q!^(l,tr), enumerates the bub- 
bles defined as a double-strand separation of amplitude 
tr > 0.5A, spanning I > 3 consecutive base-pairs begin- 
ning at the nth base-pair in the fcth simulation. In prac- 
tice we bin P n (l,tr) using bin sizes / = 1 and tr = 0.5A. 
The quantity At[q^(l , tr)] is the existence time of the 
<?^(Z, ir)'th bubble. t s ~ 1 — 2nsec is the duration of a 
single simulation (bubble life-times are in the picosecond 
range and are much smaller see Fig. 0] ) . Probabilities 
for bubble existence at a given site, for all lengths and 
amplitudes defined in this way are obviously normalized 
since all possible openings at every step of the simulation 
are counted. The plot of i ^^P n (l,tr) as obtained 




FIG. 1: Bubble probability, E t °°_ 1 ^ p n(l, tr ), for the 69 
base-pair homogeneous A — T sequence (upper) and for the 
periodic G5A5 sequence (lower). 

from Eq. @ given in Fig. ^ demonstrates two clear 
results: 

1) The probabilities for bubble formation in a homo- 
geneous T — A sequence ( or for a G-C sequence; not 
shown) do not depend on the base pair index because of 
the translation invariancc. 

2) For the periodic sequence G$A 5 , the probabilities 
are periodic with the period of the sequence. We can 
clearly identify the sources of the bubbles to be situated 
in the AT-rich half-period of the sequence. In contrast, 
we observed for a G±A\ sequence (not shown) the prob- 
abilities to be almost spatially uniform because of the 



short periodicity - only a single G — C base pair between 
the two A—T base pairs. The important observation from 
these results is that not only the length of the hot spots 
(the T — A areas) but also the length of the "barriers" 
(the G — C intervals) play crucial roles for the probability 
of the bubble existence by restricting, through impedance 
mismatch, the energy flow from the A — T rich regions. 
Clearly bubbles of all length have thermodynamic weight, 
increasing with temperature and decreasing with bubble 
size. However, the base pair inhomogeneity preferentially 
selects long-lived bubbles of specific sizes at specific lo- 
cations. This is a result of length scale competitions in- 
herent in nonlinear, disordered systems [lr| . 



0.0002 >i 
+> 

■H 




Bubble probability, ^ P, 



FIG. 2 

base-pair AAV P5 promoter (see Ref. 
and M — 800 simulations. 



(l,tr), for the 69 
3), with t a = 2nsec 



In Fig. 121 wc show V°° ° P n (Ltr) for the 69 base- 
air adeno associate viral (AAV) P5 promoter (see Ref. 
). Two regions are prominent in terms of largest prob- 
abilities for bubbles existence. These regions are located 
around base-pairs +1 and —30, which have previously 
been identified as the transcription start site (TSS), and 
the binding site for the TATA-binding protein (TBP), re- 
spectively. It is noteworthy that the probability becomes 
more localized around the identified sites as increasingly 
longer (in terms of consecutive sites) bubbles are consid- 
ered. This result is in agreement with findings in similar 
simulations investigating the frequency of the base pair 
opening 0] . There is also good agreement with the ther- 
modynamic results 0, 0] derived from the PBD model, 
however the active regions are much more sharply iden- 
tified here than in the case of the thermodynamic treat- 
ment. 

Our results confirm that the bubble localized at the 
transcription promoter site can aid the RNA polymerase 
and the associated proteins in the formation of the tran- 
scription bubble 

It has been argued || that the results obtained in Ref. 
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[j] are flawed by insufficient statistics in the simulations. 
We therefore present in Fig. [2] the result of M = 800 
simulations of t s = 2nsec duration. Similar results were 
obtained from simulations of t s = lnsec duration (not 
shown) suggesting that the statistics is indeed sufficient 
even in a 1 nsec simulation. It is important to note that 
even for the 2 nsec simulations we never observed com- 
plete melting of all the 69 base pairs indicating that we 
are exclusively sampling the premelting regime. 




GTGGDDA^AGGGTATATATGGDCGAGTGAGeGAGOAGGBTCTDCGDTTTGACC.GCGBAATTTGAACG 



FIG. 3: Bubble probability, Y^°° = P n (l,tr), for a mutated 

tr — 1.5A 

AVV P5 sequence (see text). 



In Fig. |2| we similarly show ^ ^ P n (l, tr) for a 

mutated AVV P5. The mutation, which has severe con- 
sequences for the promoter's ability to induce transcrip- 
tion (sec Rcf. 3]), consists of changing base-pairs +1 
and +2 to G — C pairs. From Fig. [3| we observe that 
this mutation indeed severely inhibits the formation of 
large bubbles around the +1 base-pair. Comparing Figs. 
|5]and|21 we sec that changing just two base pairs is a suf- 
ficient increase of the G — C "barrier" to restrict the flow 
of thermal energy to be exclusively downstream in the 
sequence. This is the mechanism by which the mutation 
can induce rather long-range effect, such as the change 
in the probability around base pair —30, although this 
effect may be specific to periodic boundary conditions. 

From these simulations we confirm that, for these het- 
erogeneous sequences, the maxima of the PDFs corre- 
spond to biologically important sites in the sequence, and 
that even small changes of the sequence can lead to sig- 
nificant changes in the spatio-temporal probabilities. 

In order to shed more light on the role of the life-time 
of the bubbles, we calculate a distribution function for 
the average bubble duration (ABD): 



ABD (n,l,tr) 



E «c; (i ' tr) At^(Mr)]' 



■■(l,tr) 



(3) 



where the denominator is the total number of bubbles, 
with strand-separation tr, in the k th simulation, span- 
ning I base pairs beginning at the n'th base-pair. It is 
important to emphasize that the information contained 
in the ABD can not be accessed through any thermody- 
namic considerations. In Fig. 0] we show the quantity 
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G TGG C G ATTT AG G G T AT ATATGG CCGAGTGAGC6AGCAGGATCTCCG CTTTG ACC G CG A AATTTG AACG 



FIG. 



bubble 
AAV P5 

J tr=4.5A 

mutated AVV P5 (lower) sequences. 



4: Average 
ABD(n,l,tr), for 



duration time, 
(upper) and for 



M 



4 5A ABD(n, I, tr) for the AAV P5 sequence as well 
as for the mutated AAV P5 sequence. The immediate ob- 
servation is that the wild version of the AVV P5 sequence 
overall supports bubbles of significantly longer duration 
than the mutated version. This is particularly true for 
bubbles of large strand separation. As documented by 
Fig. El the mutated AAV P5 certainly supports a num- 
ber of large bubbles but their duration is significantly 
shorter. Also, Fig. 0] shows that the region around base- 
pair +1 in the wild sequence supports large amplitude, 
long lived bubbles, a feature that is completely absent in 
the mutated sequence. 

To compare the probability for bubble existence 
for all simulated sequences, we show in Fig. [5] 
the quantities £n=-46 E" =1 5 j - p n(M r ) (upper) and 

ErcL -46 Z^bp-P" (MO (lower). In the two plots we 
show results for homogeneous A—T and G — C sequences, 
together with AAV P5 and its mutated version. Also 
shown are results for the two periodic sequences (G\A{) 
and (G5A5). All probabilities decrease exponentially 
with size (amplitude and length) rendering large bub- 
bles rare dynamical events. As is natural given the softer 
A — T potential, the probability for any bubbles is always 
largest in a homogeneous A — T sequence and lowest in a 
homogeneous G — C sequence. Comparing the results for 
homogeneous, periodic and heterogeneous sequences, it is 
clear that the bubble probability depends very little on 
sequence, but mainly on the AT and GC content Fi- 
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FIG. { 

Z^n=-46 



Amplitude [A'] 



Bubble length probability (log 



scale), 

y^°° ° Pn(l,tr) (upper), and bubble ampli- 

tr — 1.5A 

tude probability (log scale), Y^ilsbp Pn(l,tr) (lower). 

The insets compare these for AAV P5 at 1 nsec and 2 nsec 
simulation times. 



nally, we observe that the periodic (G5A5) sequence has 
less probability for longer bubbles in comparison with the 
sequence with smaller period (G\Ai) and with the het- 
erogeneous AAV P5 sequence, confirming that the GC 
"barriers" and their impedance role is restricting energy 
flow in the sequence. 

In the case of the bubble amplitude probability, there 
is a strong dependence on the actual sequence. The het- 
erogeneous AAV P5 sequence sustains high amplitude 
bubbles significantly better than the periodic sequences 
(Gi Ai) and (G5A5) with the same AT content. Even the 
mutated AAV P5 sequences with slightly less AT content 
than the periodic sequences {G\Ai) and (G5A5) is more 
probable to sustain bubbles with amplitudes over AA. 
Therefore the amplitude of the bubbles is is sensitive to 
the exact sequence. In the heterogeneous sequences the 
probability for bubble with high amplitudes is larger than 
in the periodic sequences with the same or even a little 
less AT content. This is consistent with recent demon- 



stration of melting temperatures being sensitive to 
intra-scqucncc correlation rather than being simply de- 
termined by AT and GC content. 

The insets of Fig. [5] compare the results of the 1 
nsec and 2 nsec simulations for the AAV P5 sequence. 
Since the results are equivalent up to amplitudes larger 
that 6 A and bubble lengths larger than 14 base-pairs, 
we conclude that in this sequence the finite time effects 
in Langevin simulations exists only beyond these ampli- 
tudes and lengths. 

In summary we have performed Langevin simulations 
of the PBD model of DNA and confirmed earlier results 
regarding the sequence dependence of bubble formation 
in agreement with results obtained on a purely thermo- 
dynamic basis. However, we find that the dynamics more 
sharply delineates the regions active for thermal strand 
separation because the life-times of bubbles are directly 
accounted for. We find that the probability for larger 
bubbles (lengths and amplitudes) is higher for hetero- 
geneous than for periodic sequences with the same A-T 
content. The important role of the length of the G — C 
"barriers" for bubble existence was identified. We find 
that the bubbles with maximum duration begin their 
existence at biologically significant sites, and that these 
bubble initiation sites are different for bubbles with dif- 
ferent amplitudes. Finally, we found a striking sensi- 
tivity of the bubble life-time on sequence. Therefore we 
suggest that DNA's ability to sustain bubbles in some re- 
gions is a result of competition between length scales aris- 
ing from the nonlincarity and the sequence heterogeneity, 
and that this competition sensitively controls the bubble 
life-times. Since specific biological function are likely to 
be aided by long-lived openings of specific sizes, this in- 
formation regarding size-lifetime relationships is directly 
relevant. 
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